The command-line interface (CLI) and graphical user interface (GUI) are two different ways of interacting with a computer’s operating system. Most people are familiar with the GUI, as it is the default interface for most software. In a GUI, you interact with visual representations of files, folders, applications, and other elements. In contrast, the CLI involves working primarily with text-based representations of files, folders, input, and output.
Most data processing and remote access will be command-line based. For this we need an interpreter.
A shell is a command-line interpreter that allows users to type commands to launch programs.
The most popular UNIX shell is BASH (the Bourne Again SHell) — so named because it is derived from a shell written by Stephen Bourne. Learning to use the shell requires time and effort.
While a GUI presents you with choices to select, CLI does not automatically display these options to you. Instead, you will need to learn specific commands. This will resemble learning a new language!
Using the shell provides access to internal system controls, remote servers, and customizable workflows through scripting. With the shell, you can create, edit, and delete files, as well as perform many other tasks efficiently.
The command line is often the easiest and most effective way to interact with remote machines and supercomputers. Familiarity with the shell is essential for running specialized tools and accessing high-performance computing (HPC) resources. As clusters and cloud computing systems become increasingly important for scientific data analysis, proficiency in the shell has become a critical skill. The command-line skills introduced here will enable you to address a wide range of scientific questions and computational challenges.
On Mac or Linux machines, you can access the shell locally through a program called Terminal. However, for simplicity and convenience, we will use the shell through the open-source web portal Vermont Advanced Computing Center - Open OnDemand (VACC-OOD). Once you open the terminal, you will begin learning the basics of shell programming with the Bourne Again Shell (Bash).
Most data-processing tasks in bioinformatics require more computing power than we have on our workstations. For all bioinformatics projects performed in this course, you will work over a network connection with the VACC.
Cluster Basics
Nodes are generally classified by their roles: login nodes and compute nodes. Login nodes are used for accessing the cluster, setting up jobs, and managing workflows, while compute nodes handle the actual computational analysis or work. Most clusters have a few login nodes and many compute nodes to efficiently handle diverse workloads.
Common characteristics of a Cluster:
As of March 2022, the VACC provides three Clusters:
Please note that more information can always be found at the Vermont Advanced Computing Center website.
Each student has been provided with their own personal VACC account that they can use to access VACC-Open OnDemand (OOD).
Open OnDemand (OOD) is an open source web portal for high performance computing (HPC) that provides users with an easy-to-use web interface to HPC clusters.
Benefits of using OOD:
Web-based, no additional software needs to be installed on your local machine
The easiest way to run graphical user interface (GUI) applications remotely on a cluster
Typical computing with command-line requires a high learning curve whereas OOD is easy to use and simple to learn
GUI applications offered by VACC-OOD:
Use the VACC-OOD link to access the site
Add your UVM netid and password
You should be viewing the following dashboard
To access the Terminal Go to and click
>_VACC Shell Access
More information about VACC-OOD usage can be found here VACC_OOD-start-guide
If you already had a VACC account and/or are currently working towards generating and analyzing your own data, you may want to learn to log-in without VACC-OOD.
To do so, first open your terminal locally on your computer.
Once you open your terminal, your screen should look similar to below:
You see the “$” symbol?
That is where you write the “commands” that will be executed by shell (bash in this case) and your computer’s kernel.
The “$” is called the “command prompt”.
There are many ways to connect to another machine over a network, but by far the most common is through the secure shell (SSH). We use SSH because its encrypted. This makes it secure to send passwords and edit private data files.
Step by step instructions can be found here at the VACC_start_guide
To use OFF-campus you will need to VPN first. See install-cisco-vpn for more information!
Now that we are logged-in to the VACC, lets explore terminal. Your screen should look similar to the following:
To run a program, we will follow these basic steps:
The “$” is called the “command prompt”.
The command prompt on VACC will have some characters before the
$, something like [username@vacc-user1 ~],
this is telling you your username and the name of the login node you
have connected to.
The dollar sign is a prompt which shows us that the shell is waiting for input. Moving forward, when typing commands, either from these lessons or from other sources, do not type in the command prompt $, only the command that follows it.
The first thing to do is to check if there are any files in the data
folder we are currently in. When you log in to a cluster, you will land
within a folder designated specifically for your use, and is referred to
as your “home directory”. We will begin by listing the contents of our
home directory using a command called ls.
ls
Tip -
lsstands for “list” and it lists the contents of a directory.
First let’s remove this directory from last semester:
rm -r unix_lesson
Now let’s bring in a data folder from a different location on the
cluster to our home directory by using the cp command.
Copy and paste the following command all the way from
cp and including the period symbol at the end
.
cp -r /gpfs1/cl/mmg3320/course_materials/tutorials/unit1_unix .
Let’s break this down.
‘cp’ is the command for copy. This command required you to specify the location of the item you want to copy (/gpfs1/cl/mmg3320/course_materials/tutorials/unit1_unix) and the location of the destination (.); please note the space between the two in the command. The “-r” is an option that modifies the copy command to do something slightly different than usual. The “.” means “here”, i.e. the destination location is where you currently are.
Now, you should see “unit1_unix” show up as the output of
ls. This is a folder we should all have in our home
directory.
ls
Let’s look at what is inside the folder “unit1_unix” and explore this further. We are use to clicking on a folder name to open it, however, now we are forced to change our mindset and open a folder or “directory” differently within the shell environment.
To look inside the unit1_unix directory, we need to
change which directory we are in. To do this we can use the
cd command, which stands for “change directory”.
cd unit1_unix
Notice the change in your command prompt. The “~” symbol from before
should have been replaced by the string unit1_unix. This
means that our cd command ran successfully and we are now
in the new directory. Let’s see what is in here by listing the
contents:
ls
You should see:
## genomics_data other raw_fastq README.txt reference_data
Notice that ls has printed the name of the files and
directories in the current directory in alphabetical order, arranged
neatly into columns.
There are five items listed when you run ls, but how do
you know if these are files or directories with more items inside?
To answer this question, we can modify the default behavior of
ls by adding an “argument” to get more
information.
ls -F
## genomics_data/ other/ raw_fastq/ README.txt reference_data/
Anything with a “/” after its name is a directory. Things with an asterisk “*” after them are programs. If there are no “decorations” after the name, it’s a normal text file.
Each line of output represents a file or a directory. The directory
lines start with d.
Most commands will take additional arguments that control their
behavior. How do we know what arguments are available for a particular
command? The most commonly used shell commands have a manual available
that can be accessed using the man command. Let’s try this
command with ls:
man ls
This will open the manual page for ls and you will lose
the command prompt. It will bring you to a so-called “buffer” page, a
page you can navigate with your mouse or if you want to use your
keyboard we have listed some basic key strokes:
To get out of the man “buffer” page and to be
able to type commands again on the command prompt, press the
q key!
!!! example “Class Exercise”
Use the `-l` option for the `ls` command to display more information
for each item in the `unit1_unix` folder. What additional information
is provided that you didn't see with the bare `ls` command?
cd
+ Change Directory
+ used to move throughout the filesystem of a computer
ls
+ List
+ list the contents of a directory
rm
+ Remove
+ used to remove a file
This lesson has been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
*Other parts of this lesson were derived from:
Erin Alison Becker, Anita Schürch, Tracy Teal, Sheldon John McKay, Jessica Elizabeth Mizzi, François Michonneau, et al. (2019, June). datacarpentry/shell-genomics: Data Carpentry: Introduction to the shell for genomics data, June 2019 (Version v2019.06.1). Zenodo. http://doi.org/10.5281/zenodo.3260560
Babraham Bioinformatics - Training Courses. (n.d.). https://www.bioinformatics.babraham.ac.uk/training.html#unix